Background

As online services and devices become increasingly ingrained and necessary in our lives in the modern world, the Internet infrastructure and service providers that connect our devices to these online services has become increasingly important as well. Americans depend on reliable and fast broadband internet to conduct their business everyday in the 21st century.

Yet, despite this necessity, millions of Americans have little to no access to broadband internet, most of them in rural areas where Internet Service Providers deem it unprofitable to provide broadband service (Fingas 2018). A lack of fast broadband Internet also means a lack of new business investment in the community, as most modern businesses require fast, reliable internet just to run, causing many rural communities to take things into their own hands (Kang 2017).

These issues are magnified by the gross inaccuracies often found on reports analyzing broadband connectivity, creating little motivation by policy makers to fix things (Lenz 2018). This, and other factors like unaffordable broadband prices, exacerbates inequality, by restricting opportunities for communication, education, and employment to those who need them most, like unemployed citizens (Fingas 2018). This project attempts to visualize these disparities in broadband access and inequality from a selection of publicly available data on the matter.

Data

Name/Link Rows Vars Unit
NYS Broadband Availability 1,635 24 Municipality
People Without Internet 821 23 Municipality
Urban Rate Broadband Survey 9,122 14 ISP in Market
American Community Survey 2017 3,108 36 County

People Without Internet (American Community Survey 2016)

ggplot(data = people_net, aes(x = percent_no_internet, y = percent_below_poverty)) + 
  geom_point(aes(size = P_total, color = region), alpha = 0.6) + 
  scale_color_manual(values = color_pal(4)) +
   scale_size(range = c(2, 8.5), labels = comma) +
  geom_smooth(method='lm',formula=y~x, color = "dark grey") + 
  labs(title = "Communities with Higher Poverty Rates Have Less Internet Access", 
       subtitle = "US Counties that have a higher percentage of people with no Internet also \nhave a higher percentage of residents below the poverty line", 
       x = "Percent with no Internet", 
       y = "Percent Below Poverty Line", 
       caption="American Community Survey 2016", 
       color = "Region", size = "Total Population") + 
  scale_x_log10(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50)) + 
  scale_y_log10(breaks = c(0, 5, 10, 15, 20, 25, 30)) + 
  theme_master() + 
  theme(panel.grid.minor = element_blank())

As stated previously, there is evidence that a lack of Internet access can exacerbate inequality. While there is not enough evidence to make a causal relationship, the strong correlation between poverty and Internet access at least shows that a relationship does exist between the two. The lack of a clear trend between region or county population and this internet access - poverty relationship suggests that this relationship holds true across all of America.

ggplot(data = people_net, aes(x = percent_no_internet, y = percent_below_poverty)) + 
  geom_smooth(method='lm',formula=y~x, color = "dark grey") + 
  geom_point(aes(size = P_total, color = region), alpha = 0.6) + 
  scale_color_manual(values = color_pal(4)) +
  scale_size(range = c(1.5, 7.5), labels = comma) +
  facet_wrap(~ region) +
  labs(title = "The West and South Have Larger Disparities in Inequality", 
       subtitle = "The Southern and Western US have more areas with better internet and less poverty and vice versa \nwhen compared to the Northeast and Midwest", 
       x = "Percent with no Internet", 
       y = "Percent Below Poverty Line", 
       caption="American Community Survey 2016", 
       color = "Region", size = "Total Population") + 
  scale_x_log10(breaks = c(0, 5, 10, 20, 35, 50)) + 
  scale_y_log10(breaks = c(0, 5, 10, 30)) + 
  theme_master()+ 
  theme(panel.grid.minor = element_blank())

However, despite the increased variance in the South and West, the relationship between Internet Access and Poverty holds true across the US. Perhaps the wider range of population density in the South and West exacerbates the disparity of this relationship. The extreme outlier in the lower left corner in Western America, is Douglas County, which is in the top 10 most populated counties in Colorado.

Urban Broadband Survey

chart_labels = tibble(text = c('FCC \nBroadband \nCutoff', 'Trendline'), x = c(8.5, 750), y = c(25, 2500))

ggplot(data = urban_bb, aes(x = `Total Charge`, y = `Download Bandwidth Mbps`)) + 
  geom_smooth(method='lm', formula=y~x, color = "grey") + 
  geom_point(color = color_pal(1), alpha = 0.5, size = 2.5) + 
  geom_hline(yintercept = 25, linetype = 2, color = "white", size = 1) + 
  geom_label(data = chart_labels, aes(x = x, y = y, label = text), alpha = 0.95) + 
  scale_y_continuous(trans = "log10") +
  scale_x_continuous(trans = "log10") +
  labs(title = "Faster Download Speeds Mean Costlier Internet", 
       subtitle = paste("Even in urban areas, ", round(perc_not_bb, 0), "% of survey respondents from 2015 - 2018 had internet slower \nthan the 25 Mbps minimum download speed for broadband as defined by the FCC", sep = ""),
       x = "Total Monthly Internet Cost (USD)",
       y = "Download Speed (Mbps)", 
       caption = "Urban Rate Broadband Survey") +
  theme_master_dark() + 
  theme(panel.grid.minor = element_blank())

Existing literature suggests that rural areas have disproportionately less broadband Internet access than urban areas. However, even in urban areas, Internet access can be extremely expensive, even for access that does not meet the FCC’s minimum bandwidth requirements to be legally considered broadband internet. While, for the most part, people can pay more to get faster speeds, inflated pricing disproportionately affects lower income families, who may become stuck with slow internet as it is all that they can afford.

ggplot(data = filter(urban_bb, Technology != "Other"), aes(x = `Total Charge`, y = `Download Bandwidth Mbps`)) + 
  geom_smooth(method='lm', formula=y~x, color = "grey") + 
  geom_point(aes(color = Technology), alpha = 0.5, size = 2) + 
  scale_color_manual(values = color_pal(4)) +
  geom_hline(yintercept = 25, linetype = 2) + 
  facet_wrap(~ Technology) +
  scale_y_continuous(trans = "log10") +
  scale_x_continuous(trans = "log10") +
  labs(title = "Fixed Wireless Providers Consistently Deliver the Worst Value",
       subtitle = "Regardless of technology, pricing and speeds vary greatly across America",
       x = "Total Monthly Internet Cost (USD)",
       y = "Download Speed (Mbps)", 
       caption = "Urban Rate Broadband Survey") +
  theme_master() +
  hide_legend + 
  theme(panel.grid.minor = element_blank())

Naturally, download speeds vary by the technology that Internet Service Providers use in their network. Fiber to the Home offers the highest speeds, followed by cable, however both have a similar price to speed ratio. DSL is typically much slower, but also less expensive, while Fixed Wireless is, overall, more expensive for the same speeds compared to other technologies.

ggplot(agg_urban_bb, aes(area = as.numeric(counts), fill = down_value, label = Group.1, subgroup = Technology)) + 
  geom_treemap() +
  geom_treemap_subgroup_border() +
  geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.5, colour =
                             "black", fontface = "italic", min.size = 0) +
  geom_treemap_text(colour = "white", place = "centre",
                    grow = TRUE) +
  scale_fill_gradientn(trans = "log10", colors = color_pal(5, type = "continuous")) + 
  labs(title = "Cable and Fiber ISPs Provide Better Value for Internet Service",
       subtitle = "Small ISPs that use Fiber to the Home are the best value",
       fill = "Value \n(Megabits \nper USD)", 
       caption = "Urban Rate Broadband Survey") +
  theme_master()

Overall, Fiber to the Home provides the best value for consumers, with cable not far behind. DSL and Fixed Wireless have the worst value on average. While small FTTH ISPs, like Google Fiber, typically provide better value than larger fiber ISPs, this is not true for other technologies, where larger ISPs usually provide better value.

ggplot(data = urban_bb, aes(x = `Total Charge`, y = `Download Bandwidth Mbps`)) + 
  geom_smooth(method='lm', formula=y~x, color = "grey") + 
  geom_point(aes(color = ordered(Year)), alpha = 0.5, size = 2) + 
  scale_color_manual(values = color_pal(4, "continuous")) +
  geom_hline(yintercept = 25, linetype = 2) + 
  facet_wrap(~ Year) +
  scale_y_continuous(trans = "log10") +
  scale_x_continuous(trans = "log10") +
  labs(title = "Faster Download Speeds are Becoming More Prevalent",
       subtitle = "Even as faster speeds have become more common over time, the relationship \nbetween price and download speed has changed very little.",
       x = "Total Monthly Internet Cost (USD)",
       y = "Download Speed (Mbps)",
       color = "Year",
       caption = "Urban Rate Broadband Survey") +
  theme_master() +
  hide_legend + 
  theme(panel.grid.minor = element_blank())

ggplot(data = urban_bb, aes(x = factor(`Year`), y = `Total Charge`)) +
  geom_violin(aes(color = ordered(Year), fill = ordered(Year))) +
  scale_color_manual(values = color_pal(4, "continuous")) +
  scale_fill_manual(values = color_pal(4, "continuous")) +
  scale_y_continuous(trans = "log10", breaks = c(10, 30, 50, 100, 150, 300, 1000)) +
  labs(title = "Monthly Internet Costs Have Increased Slightly",
       subtitle = "Most people paid around $50 for Internet access every month in 2015-2018. \nHowever, less people are paying less than $50 and more between $50-100 in 2018 than 2015.",
       x = "Year",
       fill = "Year",
       color = "Year",
       y = "Total Monthly Internet Cost (USD)",
       caption = "Urban Rate Broadband Survey") +
  theme_master() +
  theme(legend.position="none") + 
  theme(panel.grid.minor = element_blank())
chart_labels = tibble(text = c('FCC \nBroadband \nCutoff'), x = c(0.75), y = c(25))
ggplot(data = urban_bb, aes(x = factor(`Year`), y = `Download Bandwidth Mbps`)) +
  geom_violin(aes(color = ordered(Year), fill = ordered(Year))) +
  scale_color_manual(values = color_pal(4, "continuous")) +
  scale_fill_manual(values = color_pal(4, "continuous")) +
  scale_y_continuous(trans = "log10", breaks = c(1, 5, 20, 50, 100, 300, 1000, 10000)) +
  geom_hline(yintercept = 25, linetype = 2,  color = "white", size = 1) + 
  geom_label(data = chart_labels[1,], aes(x = x, y = y, label = text), alpha = 0.95) + 
  labs(title = "Download Speeds Have Increased Significantly",
       subtitle = "High-speed Internet, especially Gigabit Internet subscribtions have increased noticeably between 2015 and 2018 \nStill, many people remained on Internet plans with speeds below the FCC minimum for broadband \nat less than 25 Mbps in 2018",
       x = "Year",
       fill = "Year",
       color = "Year",
       y = "Download Speed (Mbps)",
       caption = "Urban Rate Broadband Survey") +
  theme_master_dark() +
  theme(legend.position="none") + 
  theme(panel.grid.minor = element_blank())

Even though prices have slightly increased, there is a clear trend in people purchasing faster internet over time. This perhaps indicates that either people are upgrading their internet to faster speeds as it become available, or that faster internet has become more of a necessity, causing people to be more willing to pay for better internet. Regardless, more people have internet that meets the FCC minimum requirements for broadband in 2018 than 2015.

chart_labels = tibble(text = c("Cable", "DSL", "Fixed wireless", "FTTH", "Other"), x = rep(c(2016), times = 5), y = c(0.47, 0.27, 0.16, 0.07, 0.00))

get_perc = function(data_set, category, n_var) {
  len = length(data_set[[1]])
  for (i in 1:len) {
    data_set$percent[i] = eval(parse(text = paste("data_set$", n_var, "[i] / sum(subset(data_set,", 
                                                 category, "== data_set$", 
                                                 category, "[i])$", 
                                                 n_var, 
                                                 ")",
                                                 sep = "")))
    
  }
  return(data_set)
}

urban_bb[c(2,5)] %>% table() %>% as_tibble() %>% get_perc("Year", "n") %>%
  ggplot(aes(x = as.numeric(Year), color = Technology, y = as.numeric(percent))) + 
  scale_color_manual(values = color_pal(5)) +
  geom_line(size = 1.5) + geom_point(size = 2) +
  geom_label(data = chart_labels, aes(x = x, y = y, label = text), color = color_pal(5), family = "Pragati Narrow", fontface = "bold", size = 6) +
  labs(title = "More People are Choosing Fiber to the Home and Fixed Wireless Internet",
       subtitle = "The amount of survey respondents who had Cable or DSL Internet has decreased between 2015 and 2018",
       x = "Year", 
       y = "Percentage of Respondents",
       caption = "Urban Rate Broadband Survey") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1), breaks = c(0, 0.1, 0.2, 0.3, 0.4, 0.5)) +
  theme_master() + 
  hide_legend + 
  theme(panel.grid.minor.x = element_blank())

More people are choosing Fiber to the Home for their internet access in 2018 than previous years. This offers evidence that fiber is becoming more accessible to people, either due to increased investment in fiber infrastructure by ISPs, or from more competitive pricing by ISPs that use Fiber to the Home technology.

chart_labels = tibble(text = c("No cap", "Data cap"), x = rep(c(2017), times = 2), y = c(0.56, 0.36))

urban_bb$usage_limit = (urban_bb$`Usage Allowance GB` != "Unlimited")
urban_bb[c(2,19)] %>% table() %>% as_tibble() %>% get_perc("Year", "n") %>%
  ggplot(aes(x = as.numeric(Year), color = usage_limit, y = percent)) + 
  scale_color_manual(values = color_pal(2, reverse = TRUE), labels = c("No", "Yes")) +
  geom_label(data = chart_labels, aes(x = x, y = y, label = text), color = color_pal(2, reverse = TRUE), family = "Pragati Narrow", fontface = "bold", size = 6) +
  geom_line(size = 1.5) + geom_point(size = 2) +
  labs(title = "Data Caps are Not Disappearing",
       subtitle = "The amount of survey respondents who had Internet Service with a monthly data cap \nhas neither grown or decreased substantially between 2015 and 2018",
       x = "Year", 
       color = "Has Data Cap",
       y = "Percentage of Respondents",
       caption = "Urban Rate Broadband Survey") +
  scale_y_continuous(labels = percent_format(), limits = c(0, 0.65)) +
  theme_master() + 
  hide_legend + 
  theme(panel.grid.minor.x = element_blank())

Data Caps restrict the Internet service a customer pays for, only allowing a certain amount of data to be transferred per month. This not only disproportionately affects lower income families, it can also make it difficult or impossible to “cut the cord”, and switch from TV to video streaming services. Data caps also make it more expensive for small businesses, who often may need to upgrade to “business” internet to avoid overage charges.

New York State Broadband Availability

ny_county_bb$num_prov = ny_county_bb$num_prov %>% ordered()
levels(ny_county_bb$num_prov) = list("1" = c(1), "2" = c(2), "3" = c(3), "4" = c(4), "5" = c(5), "6+" = c(6,7,8,9,10,11,12,13) )
ggplot(data = subset(ny_county_bb, tech != "Wireline"), 
       aes(y = pop_dens, x = num_prov)) + 
  geom_violin(aes(fill = num_prov, 
                  color = num_prov)) + 
  scale_fill_manual(values = color_pal(6)) +
  scale_color_manual(values = color_pal(6)) +
  scale_y_continuous(trans = "log10") +
  facet_wrap(~ tech) +
  labs(title = "Urban Counties Have More Internet Choices",
       subtitle = "However, multiple DSL providers are more likely in rural counties",
       x = "Number of Internet Service Providers per County", 
       y = "Population Density of County (ppl/sq mi)",
       caption = "New York State Broadband Availability By Municipality") + 
  theme_master() +
  theme(legend.position="none") + 
  theme(panel.grid.minor = element_blank())

This data from New York offers some evidence that there are more ISPs the more population dense a county is. This suggests that ISPs are hesitant to enter a new market in rural areas, either because of the high barrier of entry, or due to existing ISPs in the market protecting their investment through tactics like exclusivity agreements with municipalities.

avg_wired_int = sum(subset(ny_county_bb, tech == "Wireline")$`2010 Muni Population` * subset(ny_county_bb, tech == "Wireline")$perc_homes) / sum(subset(ny_county_bb, tech == "Wireline")$`2010 Muni Population`)

perc_homes_temp = cut(subset(ny_county_bb, tech == "Wireline")$perc_homes, 
                   breaks = c(0.75, 0.8, 0.85, 0.9, 0.95, 1), 
                   labels = c("75-79%", "80-84%", "85-89%","90-94%", "95-100%"))

ny_map + geom_sf(data = subset(ny_county_bb, tech == "Wireline"), 
                 aes(fill = perc_homes_temp,
                     color = perc_homes_temp,
                     alpha = pop_dens),
                 size = 0.7) + 
  scale_fill_manual(values = color_pal(5, type = "cool")) +
  scale_color_manual(values = color_pal(5, type = "cool")) +
  scale_alpha(trans = "log", range = c(0.1, 1), breaks = c(15, 250, 4000, 45000), labels = c("15", "250", "4,000", "45,000")) +
  guides(color = guide_legend(reverse=T), fill = guide_legend(reverse=T)) +
  labs(title = "Rural Counties have Less Wired Internet Coverage",
       subtitle = paste("Yet, ", round(avg_wired_int * 100), "% of the New York population has access to wired internet service", sep = ""),
       fill = "Percent with \nCable Internet",
       color = "Percent with \nCable Internet",
       alpha = "Population Density \n(ppl/sq mi)",
       caption = "New York State Broadband Availability By Municipality") +
    theme_map() +
  theme(axis.text = element_blank(),
        legend.position = "right")

avg_fiber_int = sum(subset(ny_county_bb, tech == "Fiber")$`2010 Muni Population` * subset(ny_county_bb, tech == "Fiber")$perc_homes) / sum(subset(ny_county_bb, tech == "Fiber")$`2010 Muni Population`)

avg_cable_int = sum(subset(ny_county_bb, tech == "Cable")$`2010 Muni Population` * subset(ny_county_bb, tech == "Cable")$perc_homes) / sum(subset(ny_county_bb, tech == "Cable")$`2010 Muni Population`)

avg_dsl_int = sum(subset(ny_county_bb, tech == "DSL")$`2010 Muni Population` * subset(ny_county_bb, tech == "DSL")$perc_homes) / sum(subset(ny_county_bb, tech == "DSL")$`2010 Muni Population`)

avg_wireless_int = sum(subset(ny_county_bb, tech == "Wireless")$`2010 Muni Population` * subset(ny_county_bb, tech == "Wireless")$perc_homes) / sum(subset(ny_county_bb, tech == "Wireless")$`2010 Muni Population`)

perc_homes_temp = cut(subset(ny_county_bb, tech != "Wireline")$perc_homes, 
                   breaks = c(-0.1, 0.2, 0.4, 0.6, 0.8, 1), 
                   labels = c("0-19%", "20-39%", "40-59%", "60-89%", "90-100%"))

ny_map + geom_sf(data = subset(ny_county_bb, tech != "Wireline"), 
                 aes(fill = perc_homes_temp,
                     color = perc_homes_temp,
                     alpha = pop_dens),
                 size = 0.4) + 
 scale_fill_manual(values = color_pal(5, type = "continuous")) +
  scale_color_manual(values = color_pal(5, type = "continuous")) +
  scale_alpha(trans = "log", range = c(0.1, 1), breaks = c(15, 250, 4000, 45000), labels = c("15", "250", "4,000", "45,000")) +
  guides(color = guide_legend(reverse=T), fill = guide_legend(reverse=T)) +
  facet_wrap(~ tech) +
  labs(title = "Fiber Internet is Almost Exclusively Available to the Metro NYC Area",
       subtitle = paste("Only ", round(avg_fiber_int * 100),
                        "% of the New York population has access to fiber internet service \ncompared to ", 
                        round(avg_cable_int * 100), "% for cable, " , 
                        round(avg_dsl_int * 100), "% for DSL, " , 
                        round(avg_wireless_int * 100), "% for wireless (cellular) service" , sep = ""),
       fill = "Percent with \nInternet Tech",
       color = "Percent with \nInternet Tech",
       alpha = "Population Density \n(ppl/sq mi)",
       caption = "New York State Broadband Availability By Municipality") +
    theme_map() +
  theme(axis.text = element_blank(),
        legend.position = "right")

While overall, New York has good cable internet coverage, there still are significant disparities between rural and urban counties when it comes to cable Internet access, according to the state’s own broadband survey. However, only a few counties in New York City and the metro area have high fiber Internet adoption rates.

US Census - American Community Survey 2017

avg_us_int = 1 - (sum(acs_17_mod$total_pop * acs_17_mod$no_inet) / sum(sum(acs_17_mod$total_pop)))

states_sf %>% 
  ggplot() + 
  geom_sf(color = "grey90", fill = "white") +
  coord_sf(crs = st_crs(102003)) +
  geom_point(data = acs_17_mod, aes(x = X,
                     y = Y,
                     color = (1- no_inet),
                     size = pop_dens)) + 
  scale_color_gradientn(colors = color_pal(6, type = "cool"), labels = percent_format(accuracy = 1)) +
  scale_size(range = c(0.5, 15), breaks = c(50, 500, 5000, 50000)) +
  scale_alpha(range = c(1.0, 0.05)) +
  labs(title = paste(round(avg_us_int * 100), "% of America has Internet Access", sep = ""),
       subtitle = "Even in rural counties, the majority of households have internet access",
      color = "Percent with \nInternet",
       size = "Population Density \n(ppl/sq mi)",
      caption = "American Community Survey 2017") +
   theme_map() +
  theme(axis.text = element_blank())

states_sf %>% 
  ggplot() + 
  geom_sf(color = "grey30", fill = "#323232") +
  coord_sf(crs = st_crs(102003)) +
  geom_point(data = acs_17_mod, aes(x = X,
                     y = Y,
                     color = broadband_any,
                     size = pop_dens)) + 
  scale_color_gradientn(colors = color_pal(6, type = "warm", reverse = TRUE), labels = percent_format(accuracy = 1)) +
  scale_size(range = c(0.5, 15), breaks = c(50, 500, 5000, 50000)) +
  scale_alpha(range = c(1.0, 0.05)) +
  labs(title = "Rural Counties have Less Access to Broadband Internet",
       subtitle = "Even in urban areas, many households, while having Internet, do not have broadband Internet access",
      color = "Percent with \nBroadband",
       size = "Population Density \n(ppl/sq mi)",
      caption = "American Community Survey 2017") +
  theme_map() +
  theme(axis.text = element_blank())

In rural counties in the United States, while most households have Internet access, many do not have Internet that meets the minimum speed requirements to be considered broadband Internet. However, this varies across the country. For example, Utah has much higher percentages of households for overall Internet access and broadband Internet access than many other rural states. In fact, despite its vastness, the western United States overall has much better accessibility to high-speed internet than other areas, most notably the rural South.

Works Cited

Fingas, Jon. 2018. “Microsoft Says the Rural Broadband Divide Is Worse Than You Think.” Engadget. https://www.engadget.com/2018/12/05/microsoft-study-on-rural-broadband-shortfall/.

Kang, Cecilia. 2017. “How to Give Rural America Broadband? Look to the Early 1900s.” The New York Times. https://www.nytimes.com/2016/08/08/technology/how-to-give-rural-america-broadband-look-to-the-early-1900s.html.

Lenz, Lyz. 2018. “Iowa: Rural Broadband, and the Unknown Costs of the Digital Divide.” Colombia Journalism Review. https://www.cjr.org/special_report/midterms-2018-iowa-rural-broadband.php/.